WEBVTT 00:00.000 --> 00:06.760 This image PR episode 2077 titled Managing Time on HDR Episode 3. 00:06.760 --> 00:12.360 It is posted by name Moris and in about 32 minutes long and Karimanxly's its flag. 00:12.360 --> 00:13.360 The summary is. 00:13.360 --> 00:18.760 Looking for the best way to store and Managing Time in the HDR Nature Base, Part 3. 00:18.760 --> 00:23.160 This episode of HDR is brought to you by an honest host.com. 00:23.160 --> 00:29.560 Get 15% discount on all shared hosting with the offer code HDR15. 00:29.560 --> 00:32.000 Let's HDR15. 00:32.000 --> 00:57.360 Get your web hosting that's honest and fair at an honest host.com. 01:02.000 --> 01:18.440 Hello everybody, welcome to Hacker Public Radio. 01:18.440 --> 01:23.440 My name is Dave Morris and today I'm finishing a mini series. 01:23.440 --> 01:29.920 I started a few weeks ago on the subject of managing tags on HDR episode. 01:29.920 --> 01:33.960 This is the third episode of that mini series. 01:33.960 --> 01:37.960 So you'd probably be relieved to know that it is the last ride. 01:37.960 --> 01:42.480 It's haven't had much in the way of feedback, though I've had a few comments about help 01:42.480 --> 01:46.000 with database design and so forth which is great. 01:46.000 --> 01:52.720 In the first show we looked at the whole subject of tags and why we should need them. 01:52.720 --> 01:59.520 We need them with HDR shows and we looked at the way that we're currently storing them. 01:59.520 --> 02:02.400 And what's good and bad about that, mostly bad. 02:02.400 --> 02:07.440 The major drawbacks are with searching was what was concluded. 02:07.440 --> 02:12.000 It's the sort of thing that you might sort of design you might come up with without really 02:12.000 --> 02:17.120 fully getting the idea of what databases are for and what they can do. 02:17.120 --> 02:23.040 Hopefully this series will help to fill in some of those gaps as much as I can do that. 02:23.040 --> 02:25.280 Anyway, not being a database expert. 02:25.280 --> 02:33.360 In the second show we looked at how we could make a simple tags table and query it and thereby 02:33.360 --> 02:36.160 get a better overall effect. 02:36.160 --> 02:41.440 In particular, we defined some of the things we wanted to be able to do with tags, partly 02:41.440 --> 02:49.800 based around one of Drupes's comments back in the episode he did about the way to improve 02:49.800 --> 02:50.800 HDR. 02:50.800 --> 02:58.640 We are going to, in this last show, look at a more written here rigorous, efficient, normalised 02:58.640 --> 02:59.640 solution. 02:59.640 --> 03:08.200 So I think this is the way that somebody trained in database design would do things. 03:08.200 --> 03:14.800 And I've worked up to this over a couple of other shows, really just to air the whole 03:14.800 --> 03:21.040 subject of how one designs this type of thing for people who maybe not that experience in 03:21.040 --> 03:22.040 this thing. 03:22.040 --> 03:26.880 I'm not that experienced, but I do have come across this sort of thing in the past and implemented 03:26.880 --> 03:27.880 it. 03:27.880 --> 03:32.720 Now, one of the drawbacks with the method that we looked at the last time was that our 03:32.720 --> 03:37.480 tag table had multiple instances of the same tag. 03:37.480 --> 03:44.200 And it also doesn't conform to the accepted database design recommendations. 03:44.200 --> 03:50.800 So those two things really, well, you could get away with it, it's not really a good 03:50.800 --> 03:51.800 reason to do so. 03:51.800 --> 03:56.600 As far as I can see anyway, although I did cite some instances of people who had implemented 03:56.600 --> 04:02.320 such things in various services on the web, I don't think it's really the best way 04:02.320 --> 04:03.320 of doing things. 04:03.320 --> 04:09.400 In particular, the design in the last show doesn't really reflect the relationship 04:09.400 --> 04:14.520 between the HPR episodes and the tags that are associated with. 04:14.520 --> 04:19.040 And this relationship is what's referred to as many to many. 04:19.040 --> 04:24.640 What this means is that a given episode may have many tags, not very surprisingly, and 04:24.640 --> 04:28.520 a given tag may be associated with many episodes. 04:28.520 --> 04:37.960 Now Mike Ray covered this subject very well in episode 1,569 called many to many data 04:37.960 --> 04:39.800 relationship, how to. 04:39.800 --> 04:45.240 It covers the subject really, really well, but I'm not sure that the audience was quite 04:45.240 --> 04:49.440 ready for to hear hear such a thing, I think. 04:49.440 --> 04:55.240 We could definitely do with more database related shows for the sake of people who might 04:55.240 --> 04:57.560 be wanting to get more into databases. 04:57.560 --> 05:02.000 Certainly, I would recommend that you listen to Mike Ray's show if you haven't done 05:02.000 --> 05:09.360 this already, or if you've forgotten what he said, I've certainly listened and read his 05:09.360 --> 05:14.240 wonderful notes since then, because it does give a very good overview of how you would do 05:14.240 --> 05:15.240 this type of thing. 05:15.240 --> 05:22.680 So in the many to many design, one copy of each tag would be held in a tags table, and there 05:22.680 --> 05:30.080 would be a second table, which is linking or cross referencing the tags and episodes in 05:30.080 --> 05:34.640 our particular case, I mean, many to many is using many contexts, but in the one that 05:34.640 --> 05:37.560 we're looking at here, we're talking about episodes and tags. 05:37.560 --> 05:45.040 So I drew a very simplistic diagram using open office, Libra Office Draw, which has its 05:45.040 --> 05:48.480 limitations for the sort of thing, but hopefully it gets a message across. 05:48.480 --> 05:54.520 I've shown an example where we've got an episode's table, we're imagining that there's 05:54.520 --> 06:03.200 a show 1, 2, 3, 4, which I've just shown, that box says ID, 1, 2, 3, 4 in it, then that is 06:03.200 --> 06:10.320 associated with the tag banana, obviously I'll show about bananas, and the way it's 06:10.320 --> 06:16.520 done is that there's a joining table that says that contains a record that says episode 06:16.520 --> 06:24.440 in episode reference, ID, 1, 2, 3, 4 in the episode's table, and it's got a tag reference, 06:24.440 --> 06:31.400 so there's a tag's table where the tags have an index number and we have that number 06:31.400 --> 06:38.040 in the joining table, then it shows picture of the tag's table with tag ID, 4, 5, 6, 06:38.040 --> 06:43.480 I wasn't very imagined tip of these numbers, and it's associated with the tag banana. 06:43.480 --> 06:48.440 So if you like looking at pictures of things like this, you might find that useful. 06:48.440 --> 06:54.360 So what I did in order to demonstrate this and just prove to myself that this was a viable 06:54.360 --> 06:58.200 thing to do and that I fully understood it, and this is some time ago I wrote this. 06:58.200 --> 07:04.280 I used the comma separated list that we already have in the episode's table, which is called 07:04.280 --> 07:13.320 apps for some reason, so typing presumably, and I used the contents to populate new tables 07:13.320 --> 07:19.320 that I created. I did the population with a poll script, and I've included the poll script 07:19.320 --> 07:27.960 in with this show, just really for reference for completeness. I will look at it in brief a bit 07:27.960 --> 07:35.400 later on, but as part of this, there's particular episode I've also included the SQL or SQL 07:35.400 --> 07:42.520 definition of the table, and it's a file that you can download if you're interested, 07:42.520 --> 07:49.720 I call normalize tags too. I'm also listed in the notes in the long notes, and it just 07:49.720 --> 07:56.840 just in brief, it contains a new table of tags, which I call tags too, that's because already 07:56.840 --> 08:01.720 had a table called tags for the other method. You would call it something else if you were 08:01.720 --> 08:07.400 doing this for real. Remember this is experimentation. It contains single instances of tags. 08:07.400 --> 08:16.520 I did make it hold the mixed case and lowercase versions of the tag, but since I did that, 08:16.520 --> 08:23.240 I think in my SQL and MariaDB, I discovered that it's easy just to store one, because 08:23.960 --> 08:30.040 it doesn't, you have to go to some length to make it check the case of words and stuff. 08:30.040 --> 08:34.280 That wouldn't be the case with other database systems, but it is with this. There's a joining 08:34.280 --> 08:40.680 table here, which I called apps underscore tags to underscore x-reff. The convention is, 08:40.680 --> 08:44.920 and this is something that Mike Ray mentioned as well. The fact that it's joining two tables, 08:44.920 --> 08:50.120 it's good to give it a name that references the two tables, and then follow it with x-reff, 08:50.120 --> 08:55.880 meaning that it's across referencing tables. It's easy when you come to look at a database 08:55.880 --> 09:03.720 to work out what the thing is from its name. This table contains just two columns. One is called 09:03.720 --> 09:11.560 app's ID, and the other one is called tags to ID. So app's ID is the key of one of the rows in the 09:11.560 --> 09:16.200 app's table. So in other words, it's an app, so a number, because that's the key to this table. 09:16.200 --> 09:23.320 Tagged to ID is the equivalent of a reference to a row in the tags to table. 09:23.320 --> 09:29.400 One of the key things that you need to do when creating such a cross-reference or joining table 09:29.400 --> 09:39.560 is that you need to give it a unique index, which is a structure which stores pointers to 09:39.560 --> 09:46.520 the rows in the table effectively. And in this particular case, it combines the two columns 09:46.520 --> 09:54.680 to make the key, and it's defined as a unique index. That means that you cannot have a case where 09:54.680 --> 10:01.720 they're the same episode and the same tag is repeated. It doesn't make any sense to do that, but 10:01.720 --> 10:07.960 since databases can control these things, then it's a good idea too to make that the case. 10:07.960 --> 10:15.320 It's in to enforce it this way. So you can imagine in this table there will be multiple instances 10:15.320 --> 10:23.560 of the episode number, because a given episode is entirely likely to have multiple tags, 10:23.560 --> 10:30.120 and for each instance of that episode number, there will be a different tag number 10:31.480 --> 10:39.000 referencing into the tags to table. I also created an index a unique index on the 10:39.000 --> 10:47.880 tags to table, which just ensures that it's impossible to add the same tag twice into that table. 10:47.880 --> 10:53.560 Of course otherwise the database would let you doing this means that you can't. Then there's a 10:53.560 --> 10:59.240 further index are created, which is probably not necessary, which are called all tags, 10:59.240 --> 11:07.960 and it indexes the tags to ID column of the cross referencing table. That's because when 11:07.960 --> 11:13.640 managing the tags, if a tag gets deleted for whatever reason, maybe it was misspelled, 11:13.640 --> 11:18.280 you want to change it for some reason, and effectively have to delete the oil and replace it with 11:18.280 --> 11:27.000 the new, then deletion is done if it's done external to the database, then the index helps to 11:27.000 --> 11:31.480 to speed things up. I'll come on to that in a little bit more detail later on. 11:31.480 --> 11:37.560 One of the things that Mike mentioned in his show was the use of so-called foreign keys. 11:37.560 --> 11:44.440 Now MariaDB or mySQL and all the various other databases are so-called relational databases, 11:44.440 --> 11:50.040 which means that the relationship between table or entities, if you want to call them that, 11:50.040 --> 11:57.960 can be defined by various components of the definition. This wants lots of further 11:57.960 --> 12:04.600 tutorial episodes to fully grasp, but bear with me. So the thing so-called referential 12:04.600 --> 12:14.680 integrity is managed through foreign keys. But by default, mySQL before and MariaDB now 12:15.160 --> 12:20.120 don't support foreign keys. So if you create a table, you don't get the ability to create 12:20.120 --> 12:25.240 foreign keys on it. I think that's true, and at one point mySQL, all the predecessor, 12:25.240 --> 12:31.720 MariaDB, couldn't actually do foreign key relationship at all. That's the reason I never used it 12:31.720 --> 12:38.200 in my work, because the time I wanted to do that, I wanted to be able to use foreign keys. 12:38.200 --> 12:44.680 There are different types of table you can create, and I've noted down here that by default, 12:44.680 --> 12:49.800 you get one which doesn't support these features. You have to ask for it explicitly. 12:49.800 --> 12:55.480 At the moment, we don't have any of these types of table that can do this stuff in the 12:55.480 --> 13:03.320 HPI database. So I've not implemented this in my example, though it's something we should be doing. 13:03.320 --> 13:08.760 So a foreign key then is a way of showing relationships between database table, 13:08.760 --> 13:14.520 the data within them, making a table dependent on another effect. So the field 13:14.520 --> 13:22.440 app's ID in the cross-reference table is an episode ID number in the app's table, and it should only 13:22.440 --> 13:29.160 contain episode ID numbers which match episode numbers in the app's table. You can't just add 13:29.160 --> 13:37.560 999 in there, because it's saying I am referencing an existing entity in the 13:37.560 --> 13:44.200 an existing row in the app's table. If you define it as a foreign key, then the database itself 13:44.200 --> 13:50.600 will say no, you can't do this because this doesn't exist in the other table. So the tags 13:50.600 --> 13:57.320 to ID field in the same table would also be a foreign key pointing into the tags to table. 13:57.320 --> 14:02.920 And again, the database would constrain what can be placed there. One of the other things that 14:02.920 --> 14:09.960 foreign keys can do is that it can ensure that if you delete something, like you delete a tag, 14:09.960 --> 14:19.720 you delete the final reference to a tag, then you can get a database to manage all of the 14:19.720 --> 14:29.000 deletion. So that the sort of scenario would be, if you had a tag banana referred to by an episode, 14:29.000 --> 14:36.120 then if you deleted that tag in the sense that you said, this was a mistake, it's no longer 14:36.120 --> 14:41.880 to be associated. You want to delete the tag from the tag table. You also want to delete the 14:41.880 --> 14:51.080 cross-reference. So the maintenance of integrity is a feature called cascading deletion, 14:51.080 --> 15:00.040 which means that you can't delete a cross-reference entry without the tag also being deleted. 15:00.040 --> 15:04.920 You don't end up with tags sitting in the tag table, which don't belong anywhere. 15:04.920 --> 15:12.280 Often, I think is the term. Certainly the term I am inclined to use and Mike similarly, I noticed. 15:12.280 --> 15:17.960 So I'm not sure I explained that very well, but Mike did a much better job of it, and so I 15:17.960 --> 15:23.480 again refer you back to episode 15, 69 if you want to understand this more. 15:23.480 --> 15:28.680 So I've added in here note about the pulse script that I wrote, which I called, 15:28.680 --> 15:35.000 refresh underscore tags underscore 2. It's quite a complicated script, so I'm not 15:35.000 --> 15:38.760 I'm not going to go into detail to about it. It's not really an hugely relevant here, 15:38.760 --> 15:44.520 but I thought I would just put it make it available in case anybody wanted to read it and 15:44.520 --> 15:51.400 understand it. In just to explain it in three quick paragraphs, it scans the app's table 15:51.400 --> 15:58.440 in the database, collecting all of the tags stored in CSV form, and it stores them away with 15:58.440 --> 16:05.640 the episode number they belong to. It also collects the tags stored in the tags two table if there 16:05.640 --> 16:13.320 are any, and stores them again associated with an episode number. So it's made two tables of information 16:13.320 --> 16:21.160 episode number and the tags that are associated with it in the two tables, remembering that 16:21.160 --> 16:28.120 this depends on the CSV list being available. It can then compare two sets of 16:28.120 --> 16:35.960 tags and look for differences. So if a new tag is appeared in the CSV list, it can add it to the 16:35.960 --> 16:43.960 tags two table. And if for some reason it's no longer there, then it can delete it. And it manages 16:43.960 --> 16:52.120 the joining table along with the tags two table to achieve this. I've noted that the script performs 16:52.120 --> 16:58.520 actions that the database itself would carry out if we used foreign keys and so forth. And so the 16:58.520 --> 17:04.520 deletion in particular deletion stuff. And again again in my notes made a reference to the fact 17:04.520 --> 17:10.760 that we really need to implement the full database capabilities in order to get this stuff. 17:10.760 --> 17:18.440 So moving on to the advantages and disadvantages of this method. Well first of all, this is the 17:18.440 --> 17:24.680 most efficient way of storing the tags. We only store one instance of everything. It's obviously 17:24.680 --> 17:31.400 vastly preferable to the common separated variable method, which we looked at in episode one. 17:31.400 --> 17:39.000 And it's also preferable to the method in the last episode because the same tags stored on 17:39.000 --> 17:42.920 you once. So if you want to make a spelling correction to a tag, you don't have to go and make it 17:42.920 --> 17:48.280 several times for example. And if we have the full relational database capabilities 17:48.280 --> 17:55.400 foreign keys, cast it, catering, deletion and so forth, we can use the database capabilities 17:55.400 --> 18:01.800 to help manage this type of structure. It's what databases are designed to do. The disadvantage 18:01.800 --> 18:08.440 and everyone enlisted one really. This is the best in terms of database design, but the concept of 18:08.440 --> 18:14.840 how it works. And this sort of way in which you manage it have become more complicated as a consequence. 18:14.840 --> 18:21.560 But I don't believe that that's really an issue, especially since you can write scripts 18:21.560 --> 18:26.920 or similar types of things. You can even write so-called stored procedures in the database to help 18:26.920 --> 18:33.080 you manage this sort of stuff. So I don't think that should be a criterion for rejecting it. 18:33.080 --> 18:40.040 So as in the previous episode, let's look at the ways in which this method could be used for 18:40.040 --> 18:46.280 searching. Now I've put quite a lot of detail in the notes here, but I won't talk about them in 18:46.280 --> 18:52.520 great detail. I'll leave them for you to examine if you're interested. The first one is how would you 18:52.520 --> 18:58.600 find all the shows that have a given tag? And I'm using the example we used originally, which was 18:58.600 --> 19:04.680 look for shows with the tag community. The one note I've made here is that since I've 19:04.680 --> 19:11.560 created the notes the last show, I discovered that the method I used to generate show notes, 19:11.560 --> 19:17.960 which uses a templating system, has the capability of making database queries within it. 19:17.960 --> 19:23.400 So I've actually done that within my notes, so my notes themselves actually querying my 19:23.400 --> 19:31.080 copy of the HPR database. And because I can do that, I can also generate HTML tables to show the 19:31.080 --> 19:37.880 results. So I've done that, and hopefully it's a bit clearer to read. I think it is, anyway, 19:37.880 --> 19:43.320 previously I was listing things which were too wide for the page. So you had to scroll sideways to 19:43.320 --> 19:50.760 see them HTML tables wrap in sensible ways, so I think it is preferable. We're not a color thing 19:50.760 --> 19:58.120 with them, that sort of stuff. So the query that is, the example of how you would do this particular 19:58.120 --> 20:05.160 type of query to get the shows of the tags with the tag community. You have to examine the 20:05.160 --> 20:11.000 app's table and the cross reference table and the tags to table in order to do this. And you need 20:11.000 --> 20:17.400 to make sure that for every row you get out of these various tables, you are making comparisons 20:17.400 --> 20:24.840 and to say that you want rows where the episode ID number matches the episode ID number in the 20:24.840 --> 20:32.280 cross reference table and similarly the tags ID number in the cross reference table matches the tag 20:32.280 --> 20:39.800 ID number in the tag table. So you're doing a sort of a set operation where the sets are overlapping 20:39.800 --> 20:45.640 between these three tables. And then when you've done that, you then say I want only the rows 20:45.640 --> 20:52.600 from this combination which have a tag of community. And you get back a list of which is the 20:52.600 --> 20:57.480 set which is actually not the same last time because since then the world's moved on and we've 20:57.480 --> 21:06.440 got more shows in the database and one of them I think it has got community as a tag. It's 21:06.440 --> 21:12.600 mine actually but the time of recording this hasn't come to the top of the queue but whatever 21:12.600 --> 21:19.800 at the time you hear this probably will have. So if you want to find shows with combination of tags 21:19.800 --> 21:28.200 against the same example as before we want to find shows which contain the tag community or the 21:28.200 --> 21:36.120 tag HPR and there's an example query how you would do this and the result as an HTML table. 21:36.120 --> 21:43.320 Again we get more back because other shows have come into the database. The shows, the previous 21:43.320 --> 21:50.600 two shows in this mini series appear and it's one because they both contain HPR as a tag. If we want 21:50.600 --> 21:58.120 to find shows which have community and HPR as tags then there's a difference the way the 21:58.120 --> 22:05.960 query is done which I have shown without a huge lot of explanation but you can see the result 22:05.960 --> 22:11.080 comes out of it. I don't think it's appropriate to drill down too deeply into this one. Now in the 22:11.080 --> 22:20.760 last episode I did a thing where I followed up groups' suggestion which was that if we select a show 22:21.720 --> 22:26.920 it was imagine that if we're looking at a particular show listening to a particular show it would 22:26.920 --> 22:34.440 be good to know which other shows share the same tag tags should say and I wrote a little bit of 22:34.440 --> 22:41.000 sequel to do this which I stored in a file and I shared the file with you. This time I haven't 22:41.000 --> 22:46.600 done the same thing. I've done the same query for the equivalent query but I haven't made it into 22:46.600 --> 22:54.840 a little SQL script. There are two queries. One is simply to get back the tags of a 22:54.840 --> 23:01.720 other relate to a given show. I'm using show 2071 as the target so we're assuming we're listening 23:01.720 --> 23:09.720 to show 2071 and that was Mr. X's show. I didn't actually know what it was in this one because I 23:09.720 --> 23:21.960 did in the last episode but it was actually. I think it was about his portable amateur radio device. 23:21.960 --> 23:29.640 Anyway the tags are amateur radio, electronics and open source and then the second query is one 23:29.640 --> 23:37.080 which scans the database for all shows which have any of those tags so and then it's actually 23:37.080 --> 23:42.680 listed them with the tag that it found so there's a batch of amateur radio or a batch of electric 23:42.680 --> 23:48.680 products and a batch of open source shows that came back from the query and again it's similar to 23:48.680 --> 23:54.600 what was done last time just different queries basically. Maybe more shows came back I didn't 23:54.600 --> 23:59.800 actually make a note how many we got back and then they get to all that important. Now in the last 23:59.800 --> 24:05.880 episode we looked at using regular expressions because there is that capability within MariaDB 24:05.880 --> 24:12.840 to find partial tags because all we've been doing so far is looking at whole tags but if we want to 24:12.840 --> 24:19.160 look at a partial tag then regular expressions what we need to use and I've got an example query 24:19.160 --> 24:28.280 which it is using the regular expression capability and it's using word boundary expression 24:28.280 --> 24:34.040 which I mentioned on the last episode so I won't go into detail. I've gone into tiny bit more 24:34.760 --> 24:42.280 detail of how this query works. It's scanning four tables scanning the X table it's scanning 24:42.280 --> 24:48.920 the host table because I thought it would be useful to get back the name of the author of the show 24:48.920 --> 24:56.680 it's scanning the joining table and the tags table tags too and the joining is done to make 24:56.680 --> 25:02.680 subsets of the tables in the way I mentioned earlier on. Then the regular expression part 25:02.680 --> 25:09.080 looks for the word ham so it's got word boundary before and after it. It's literally the word 25:09.080 --> 25:16.440 ham is a distinct word but it is and looking for it as a component of a tag. 25:16.440 --> 25:22.920 Or entire indeed the whole tag it needs if there is one but not as part of a word so a 25:22.920 --> 25:30.520 Birmingham would not come back from this query. It uses a group by thing which makes 25:30.520 --> 25:39.480 make sure that you get only one answer if the query matches the same episode twice one more. 25:39.480 --> 25:48.120 So we get back a list of shows and their dates and the titles and the host and then the tags 25:48.120 --> 25:55.000 that are associated with them and they all contain the word ham so things like ham radio. 25:55.000 --> 26:00.920 Ham as a tag actually in one case ham radio seems to be the most common one though ham radio 26:00.920 --> 26:08.040 without a space is not one of the matches looking through yeah amateur radio ham yeah 26:08.040 --> 26:14.920 ham space radio is the is the commonest one but ham just as a tag is also coming back. 26:14.920 --> 26:20.360 So there's some fan degree of sophistication that can be achieved by using this technique. 26:20.360 --> 26:26.360 Now I thought it would be useful just to finish off with a technique that's 26:26.360 --> 26:32.120 available within the database most databases offer this. This is a thing called a view. 26:32.120 --> 26:37.000 Can use it to hide away the complexity of some of the queries and there's certainly the 26:37.000 --> 26:41.880 case that the queries we're using this time around are more complex than the ones we used in the 26:41.880 --> 26:50.280 last episode. So I've created a view and I've called a view I should say is a piece of 26:50.280 --> 26:59.480 SQL SQL which is a means of storing away a select query one of those ones where you 26:59.880 --> 27:06.600 you're asking for particular rows to be returned out of the tables. So it's a way of storing 27:06.600 --> 27:14.760 away such a select query and then you can use the the name of that query to query again as if 27:14.760 --> 27:22.280 it's a table all of its own but behind the scenes the view query is being issued is being 27:22.280 --> 27:28.920 executed and the results of that are then being returned or subset of those results being returned. 27:28.920 --> 27:35.480 It's a sort of nested query type of thing. I've included the query that I came up with as a 27:35.480 --> 27:40.040 file if you want to look at it and it's also listed in the notes. I created a view called 27:40.040 --> 27:46.600 EHT view which is just a way of signifying that it relates to the app's table, the host's table 27:46.600 --> 27:53.800 and the tags table, tags two tables. I should say it queries these in a similar way to the the way 27:53.800 --> 28:01.160 we've done in previous queries and once it's there and stored away it can then be executed. 28:01.160 --> 28:08.600 I maybe explain a teeny bit more about it in a moment. Yeah this isn't really intended to be 28:08.600 --> 28:15.880 a database tutorial as such more a discussion of methodologies. So I'm using a I've created a 28:15.880 --> 28:24.200 query which is just listed in the notes here which uses the view. So it's doing a select from EHT 28:24.200 --> 28:31.800 underscore view and then it's saying where tag regular expression regaps and then it's looking 28:31.800 --> 28:40.040 for a the word solder SLDR with word boundaries before and after it and then grouping the result 28:40.040 --> 28:49.160 by the ID number that comes back. It gets back three shows which contain the tag a tag which is 28:49.160 --> 28:58.120 either solder or contains solder in the tag. So the first one is show nine four one which has 28:58.120 --> 29:06.120 got the tag solder. Second one is one or three seven Mr. X where he was he was giving a tutorial 29:06.120 --> 29:12.680 on soldering and it contains the tag solder and the third one he didn't use the tag solder 29:12.680 --> 29:18.600 and we're rather these tags actually. It can't remember who created them. It doesn't contain that tag 29:18.600 --> 29:25.720 but it contains multicore solder as a tag. So this thing fished the word solder at a multicore 29:25.720 --> 29:30.440 solder because it's a separate word and returned it which I think is the way you'd wanted to do. 29:31.000 --> 29:40.360 Now the view contains it it is a select which will do a query. It gets back all of all of the 29:40.360 --> 29:47.160 instances of episodes and hosts and tags. You then have to sort of subset that when you call it 29:47.160 --> 29:55.000 which is what the example shows but within this the view there is what's called a sub-select. 29:55.000 --> 30:02.760 So I think I did this I think I did an equivalent in the last episode and the sub-select is using 30:02.760 --> 30:10.040 a function in my SQL called group concat which looks for all instances in a table and 30:10.040 --> 30:17.080 concatenates them together with a comma. So the result the tag list that you see in the result 30:17.080 --> 30:24.600 is concatenated from the tags table all of the tags which relate that particular show. 30:24.600 --> 30:31.000 Again it's probably not a thing you would do for real but it was a demonstration more than anything 30:31.000 --> 30:37.960 else. Okay so we're now at the conclusion and my conclusion is that the HPR database needs 30:37.960 --> 30:43.800 a tag mechanism very much and we've looked at the present tag storage system in this many 30:43.800 --> 30:49.000 series so it is not a good way to do things. We've looked at a somewhat better way of doing it 30:49.000 --> 30:55.640 but I've concluded it has some drawbacks. This third example this third episode shows a better way 30:55.640 --> 31:03.400 of doing doing things in a relational database in a way that we represent the true relationship 31:03.400 --> 31:09.400 between the episodes and tags and that relationship is a many to many relationship. So you could say 31:09.400 --> 31:16.840 it's taking me three shows to get to a conclusion that Mike Bray drew in in one but I felt 31:16.840 --> 31:23.640 it was worth working through this in order to explain why and why not some why not use some of 31:23.640 --> 31:28.840 the other solutions. So I was going to require some work it's strongly recommended that we implement 31:28.840 --> 31:35.960 a tag scheme in the HPR database in the way that's been discussed in this show and we also enable 31:35.960 --> 31:43.000 the foreign key capabilities of MariaDB so that for the reasons I've mentioned along the way today 31:43.000 --> 31:51.960 and at the same time we look at doing similar upgrades to enable many to many relationships of 31:51.960 --> 31:58.600 hosts and episodes we have we don't have that when it comes to to host. I think that is as important 31:58.600 --> 32:04.120 if not more important than the tags thing. There's another one which is that there's a 32:04.120 --> 32:10.760 many some many relationship between episodes and series. Thinking episode will be could be a 32:10.760 --> 32:16.680 member of more than one series. I don't think it's as critical as the other ones mind you and 32:16.680 --> 32:22.680 prepare to be disagreed with on that one but it's definitely something you should look at if nothing 32:22.680 --> 32:29.400 else. Okay so as before I've got a little plea at the end saying please include tags in your 32:29.400 --> 32:37.480 shows and if you if you have a moment to add more tags to the missing shows the ones missing 32:37.480 --> 32:43.720 tag that should say then be very very much appreciated but other than that I've finished 32:43.720 --> 32:48.040 briefscival it relieved. Okay thanks everybody bye. 32:55.240 --> 32:59.320 You've been listening to Hecker Public Radio and Hecker Public Radio.org. 32:59.320 --> 33:05.000 We are a community podcast network that release the shows every weekday Monday through Friday. 33:05.000 --> 33:10.440 Today's show like all our shows was contributed by a mage beer listener like yourself. 33:10.440 --> 33:16.360 If you ever thought of recording a podcast and click on our contributing to find out how easy 33:16.360 --> 33:22.920 it really is. Hecker Public Radio was found by the digital.com and the informomicon computer club 33:22.920 --> 33:28.360 and is part of the binary revolution at bmf.com. If you have comments on today's show 33:28.360 --> 33:34.200 please email the host directly leave a comment on the website or record a follow up episode yourself 33:34.200 --> 33:42.760 on their otherwise status. Today's show is released on the creative comments, 33:42.760 --> 34:06.020 and we will see you in the next one.